Search CORE

45 research outputs found

Speaker Diarization Based on Intensity Channel Contribution

Author: Barra Chicote Roberto
Ferreiros López Javier
Montero Martínez Juan Manuel
Pardo Muñoz José Manuel
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2011
Field of study

The time delay of arrival (TDOA) between multiple microphones has been used since 2006 as a source of information (localization) to complement the spectral features for speaker diarization. In this paper, we propose a new localization feature, the intensity channel contribution (ICC) based on the relative energy of the signal arriving at each channel compared to the sum of the energy of all the channels. We have demonstrated that by joining the ICC features and the TDOA features, the robustness of the localization features is improved and that the diarization error rate (DER) of the complete system (using localization and spectral features) has been reduced. By using this new localization feature, we have been able to achieve a 5.2% DER relative improvement in our development data, a 3.6% DER relative improvement in the RT07 evaluation data and a 7.9% DER relative improvement in the last year's RT09 evaluation data

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Archivo Digital UPM

TIMPANO: Technology for complex Human-Machine conversational interaction with dynamic learning

Author: Ferreiros López Javier
Ortega Alfonso
Sanchís Emilio
Torres Maria Ines
Publication venue: E.T.S.I. Telecomunicación (UPM)
Publication date: 01/01/2013
Field of study

El proyecto TIMPANO tiene por objetivo profundizar en el desarrollo de sistemas de comunicación oral hombre-máquina atendiendo principalmente a la capacidad de dar respuesta a múltiples requerimientos de los usuarios, como pueden ser el acceso a información, la extracción de información, o el análisis de grandes repositorios de información en audio. En el proyecto se hace especial énfasis en la adaptación dinámica de los modelos a diversos contextos, tanto de tipo acústico, como semántico o de idioma

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Archivo Digital UPM

GTH-UPM system for search on speech evaluation

Author: Coucheiro Limeres Alejandro
Echeverry Correa Julian David
Ferreiros López Javier
Publication venue: E.T.S.I. Telecomunicación (UPM)
Publication date: 01/01/2014
Field of study

This paper describes the GTH-UPM system for the Albayzin 2014 Search on Speech Evaluation. Teh evaluation task consists of searching a list of terms/queries in audio files. The GTH-UPM system we are presenting is based on a LVCSR (Large Vocabulary Continuous Speech Recognition) system. We have used MAVIR corpus and the Spanish partition of the EPPS (European Parliament Plenary Sessions) database for training both acoustic and language models. The main effort has been focused on lexicon preparation and text selection for the language model construction. The system makes use of different lexicon and language models depending on the task that is performed. For the best configuration of the system on the development set, we have obtained a FOM of 75.27 for the deyword spotting task

Archivo Digital UPM

Clustering of syntactic and discursive information for the dynamic adaptation of Language Models

Author: Fernández Martínez Fernando
Ferreiros López Javier
Lopez Ludeña Veronica
Lucas Cuesta Juan Manuel
San Segundo Hernández Rubén
Publication venue: E.T.S.I. Telecomunicación (UPM)
Publication date: 01/01/2010
Field of study

Presentamos una estrategia de agrupamiento de elementos de diálogo, de tipo semántico y discursivo. Empleando Latent Semantic Analysis (LSA) agru- pamos los diferentes elementos de acuerdo a un criterio de distancia basado en correlación. Tras seleccionar un conjunto de grupos que forman una partición del espacio semántico o discursivo considerado, entrenamos unos modelos de lenguaje estocásticos (LM) asociados a cada modelo. Dichos modelos se emplearán en la adaptación dinámica del modelo de lenguaje empleado por el reconocedor de habla incluido en un sistema de diálogo. Mediante el empleo de información de diálogo (las probabilidades a posteriori que el gestor de diálogo asigna a cada elemento de diálogo en cada turno), estimamos los pesos de interpolación correspondientes a cada LM. Los experimentos iniciales muestran una reducción de la tasa de error de palabra al emplear la información obtenida a partir de una frase para reestimar la misma frase

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Archivo Digital UPM

HIFI-AV: An Audio-visual Corpus for Spoken Language Human-Machine Dialogue Research in Spanish

Author: Barra Chicote Roberto
Fernández Martínez Fernando
Ferreiros López Javier
Lucas Cuesta Juan Manuel
Macías Guarasa Javier
Publication venue: E.T.S.I. Telecomunicación (UPM)
Publication date: 01/01/2010
Field of study

In this paper, we describe a new multi-purpose audio-visual database on the context of speech interfaces for controlling household electronic devices. The database comprises speech and video recordings of 19 speakers interacting with a HIFI audio box by means of a spoken dialogue system. Dialogue management is based on Bayesian Networks and the system is provided with contextual information handling strategies. Each speaker was requested to fulﬁl different sets of speciﬁc goals following predeﬁned scenarios, according to both different complexity levels and degrees of freedom or initiative allowed to the user. Due to a careful design and its size, the recorded database allows comprehensive studies on speech recognition, speech understanding, dialogue modeling and management, microphone array based speech processing, and both speech and video-based acoustic source localisation. The database has been labelled for quality and efﬁciency studies on dialogue performance. The whole database has been validated through both objective and subjective tests

Archivo Digital UPM

Evaluation of a Spoken Dialogue System for controlling a Hifi audio system

Author: Barra Chicote Roberto
Blázquez Juan
Fernández Martínez Fernando
Ferreiros López Javier
Lucas Cuesta Juan Manuel
Macías Guarasa Javier
Publication venue: E.T.S.I. Telecomunicación (UPM)
Publication date: 01/01/2008
Field of study

In this paper a Bayesian Networks, BNs, approach to dialogue modelling [1] is evaluated in terms of a battery of both subjective and objective metrics. A significant effort in improving the contextual information handling capabilities of the system has been done. Consequently, besides typical dialogue measurement rates for usability like task or dialogue completion rates, dialogue time, etc. we have included a new figure measuring the contextuality of the dialogue as the number of turns where contextual information is helpful for dialogue resolution. The evaluation is developed through a set of predefined scenarios according to different initiative styles and focusing on the impact of the user’s level of experience

Archivo Digital UPM

Language recognition using phonotactic-based shifted delta coefficients and multiple phone recognizers

Author: Cordoba Herralde Ricardo de
D'Haro Enriquez Luis Fernando
Ferreiros López Javier
Salamea Palacios Christian Raúl
Publication venue: E.T.S.I. Telecomunicación (UPM)
Publication date: 01/01/2014
Field of study

A new language recognition technique based on the application of the philosophy of the Shifted Delta Coefficients (SDC) to phone log-likelihood ratio features (PLLR) is described. The new methodology allows the incorporation of long-span phonetic information at a frame-by-frame level while dealing with the temporal length of each phone unit. The proposed features are used to train an i-vector based system and tested on the Albayzin LRE 2012 dataset. The results show a relative improvement of 33.3% in Cavg in comparison with different state-of-the-art acoustic i-vector based systems. On the other hand, the integration of parallel phone ASR systems where each one is used to generate multiple PLLR coefficients which are stacked together and then projected into a reduced dimension are also presented. Finally, the paper shows how the incorporation of state information from the phone ASR contributes to provide additional improvements and how the fusion with the other acoustic and phonotactic systems provides an important improvement of 25.8% over the system presented during the competition

Archivo Digital UPM

Facilitating Preference Revision through a Spoken Dialogue System

Author: Augusto Juan Carlos
Aztiria Asier
Ferreiros López Javier
Lucas Cuesta Juan Manuel
McTear Michael
Publication venue: E.T.S.I. Telecomunicación (UPM)
Publication date: 01/01/2009
Field of study

We present the design of a spoken dialogue system to provide feedback to users of an autonomous system which can learn different patterns associated with user actions. Our speech interface allows users to verbally refine these patterns, giving the system his/her feedback about the accuracy of the actions learnt.We focus on improving the naturalness of user interventions, using a stochastic language model and a rule-based language understanding module. The development of a state-based di- alogue manager which decides how to conduct each dialogue, together with the storage of contextual information of previous dialogue turns, allows the user to speak to the system in a highly natural way

Archivo Digital UPM

Advanced Speech Communication System for Deaf People

Author: Córdoba Herralde Ricardo de
Ferreiros López Javier
Lebai Lutfi Syaheerah Binti
Lopez Ludeña Veronica
Martin Maganto Raquel
Pardo Muñoz José Manuel
San Segundo Hernández Rubén
Publication venue: E.T.S.I. Telecomunicación (UPM)
Publication date: 01/01/2010
Field of study

This paper describes the development of an Advanced Speech Communication System for Deaf People and its field evaluation in a real application domain: the renewal of Driver’s License. The system is composed of two modules. The first one is a Spanish into Spanish Sign Language (LSE: Lengua de Signos Española) translation module made up of a speech recognizer, a natural language translator (for converting a word sequence into a sequence of signs), and a 3D avatar animation module (for playing back the signs). The second module is a Spoken Spanish generator from sign writing composed of a visual interface (for specifying a sequence of signs), a language translator (for generating the sequence of words in Spanish), and finally, a text to speech converter. For language translation, the system integrates three technologies: an example based strategy, a rule based translation method and a statistical translator. This paper also includes a detailed description of the evaluation carried out in the Local Traffic Office in the city of Toledo (Spain) involving real government employees and deaf people. This evaluation includes objective measurements from the system and subjective information from questionnaire

Archivo Digital UPM

Evaluation of a User-Adapted Spoken Language Dialogue System: Measuring the Relevance of the Contextual Information Sources

Author: Dragos Rada G.
Fernández Martínez Fernando
Ferreiros López Javier
Lebai Lutfi Syaheerah Binti
Lucas Cuesta Juan Manuel
Publication venue: E.T.S.I. Telecomunicación (UPM)
Publication date: 01/01/2011
Field of study

We present an evaluation of a spoken language dialogue system with a module for the management of userrelated information, stored as user preferences and privileges. The ﬂexibility of our dialogue management approach, based on Bayesian Networks (BN), together with a contextual information module, which performs different strategies for handling such information, allows us to include user information as a new level into the Context Manager hierarchy. We propose a set of objective and subjective metrics to measure the relevance of the different contextual information sources. The analysis of our evaluation scenarios shows that the relevance of the short-term information (i.e. the system status) remains pretty stable throughout the dialogue, whereas the dialogue history and the user proﬁle (i.e. the middle-term and the long-term information, respectively) play a complementary role, evolving their usefulness as the dialogue evolves

Archivo Digital UPM